Measuring the Nonlinear Biasing Function 
from a Galaxy Redshift Survey 



Yair Sigad 1 , Enzo Branchini 2 , & Avishai Dekel 1 



ABSTRACT 

We present a simple method for evaluating the nonlinear biasing function of 
galaxies from a redshift survey. The nonlinear biasing is characterized by the 
conditional mean of the galaxy density fluctuation given the underlying mass 
density fluctuation (8 g \5), or by the associated parameters of mean biasing b and 
nonlinearity b (following Dekel & Lahav 1999). Using the distribution of galaxies 
in cosmological simulations, at smoothing of a few Mpc, we find that (S s \8) can 
be recovered to a good accuracy from the cumulative distribution functions of 
galaxies and mass, C g (S g ) and C(5), despite the biasing scatter. Then, using a 
suite of simulations of different cosmological models, we demonstrate that C(8) 
can be approximated in the mildly nonlinear regime by a cumulative log-normal 
distribution of 1 + 5 with a single parameter a, with deviations that are small 
compared to the difference between C g and C. Finally, we show how the nonlin- 
ear biasing function can be obtained with adequate accuracy directly from the 
observed C g in redshift space. Thus, the biasing function can be obtained from 
counts in cells once the rms mass fluctuation at the appropriate scale is assumed a 
priori. The relative biasing function between different galaxy types is measurable 
in a similar way. The main source of error is sparse sampling, which requires that 
the mean galaxy separation be smaller than the smoothing scale. Once applied 
to redshift surveys such as PSCz, 2dF, SDSS, or DEEP, the biasing function can 
provide valuable constraints on galaxy formation and structure evolution. 
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1. INTRODUCTION 

The fact that galaxies of different types cluster differently (e.g., Dressier 1980; Lahav, 
Nemiroff & Piran 1990; Santiago & Strauss 1992; Loveday et al. 1995; Hermit et al. 1996; 
Guzzo et al. 1997) indicates that the galaxy distribution is in general biased compared to 
the underlying mass distribution. Cosmological simulations confirm that halos and galaxies 
must be biased (e.g., Cen & Ostriker 1992; Kauffmann, Nusser & Steinmetz 1997; Blanton 
et al. 1999; Somerville et al. 2000). The biasing becomes even more pronounced at high 
redshift, as predicted by theory (e.g., Kaiser 1986; Davis et al. 1985; Bardeen et al. 1986; 
Dekel & Rees 1987; Mo & White 1996; Bagla 1998; Jing & Suto 1998; Wechsler et al. 1998), 
and confirmed by the strong clustering of galaxies observed at z ~ 3 (Steidel et al. 1996; 
1998). Knowing the biasing scheme is crucial for extracting dynamical information and 
cosmological constants from the observed galaxy distribution, and may also be very useful 
for understanding the process and history of galaxy formation. 

The simplest possible biasing model relating the density fluctuation fields of matter and 
galaxies, 5 and 5 g , is the deterministic and linear relation, S g (x) = bS(x), where b is a constant 
linear biasing parameter. However, this is at best a crude approximation, because it is not 
self-consistent (e.g., it does not prevent 5 g from becoming smaller than —1 when b > 1) and 
is not preserved in time. At any given time, scale and galaxy type, the biasing is expected 
in general to be nonlinear, i.e., b should vary as a function of 5. The nonlinearity of dark- 
matter halo biasing (as well as its dependence on scale, mass and time) is approximated 
fairly well by the model of Mo & White (1996), based on the extended Press-Schechter 
formalism (Bond et al. 1991). Improved approximations have been proposed by Jing (1998), 
Catelan et al. (1998), Sheth & Tormen (1999) and Porciani et al. (1999). It is quantified 
further for halos and galaxies using cosmological iV-body simulations with semi-analytic 
galaxy formation (e.g., Somerville et al. 2000). The biasing is also expected, in general, to 
be stochastic, in the sense that a range of values of 5 g is possible for any given value of 5. For 
example, if the biasing is nonlinear on one scale, it should be different and non-deterministic 
on any other scale. The origin of the scatter is shot noise as well as the influence of physical 
quantities other than mass density (e.g., velocity dispersion, the dimensionality of the local 
deformation tensor which affects the shape of the collapsing object, etc.) on the efficiency 
of galaxy formation. 

Dekel & Lahav (1999) have proposed a general formalism for galaxy biasing, that sep- 
arates nonlinearity and stochasticity in a natural way. The density fields are treated as 
random fields, and the biasing is fully characterized by the conditional probability distri- 
bution function P(5 g \5). The constant linear biasing factor b is replaced by a mean biasing 
function, 

(S 8 \S) = b(5) 8, (1) 

which can in principle take a wide range of functional forms, restricted by definition to have 
(5 g ) = and (S g \S) > —1 for any 5. The stochasticity is expressed by the higher moments 
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about this mean, such as the conditional variance 

al(5) = (e 2 \8)/a 2 , e = 8 g -(8 g \8), (2) 

scaled for convenience by the variance of mass fluctuations, a 2 = (8 2 ). To second order, the 
biasing function b(8) can be characterized by two parameters: the moments b and b, 

b= (b(8)8 2 )/a 2 and b 2 = {b 2 (8) 8 2 } / a 2 . (3) 

The parameter b is the natural extension of the linear biasing parameter, measuring the 
slope of the linear regression of 8 g on 8, and b/b is a useful measure of non-linearity. The 
stochasticity is characterized independently by a third parameter, a 2 = (e 2 )/a 2 . As has 
been partly explored by Dekel & Lahav (1999), these parameters should enter any nonlinear 
analysis aimed at extracting the cosmological density parameter Q from a galaxy redshift 
survey, and are therefore important to measure. 

In this paper we propose a simple method to measure the biasing function b(5) and the 
associated parameters b and b from observed data that are either already available, such 
as the PSCz redshift survey (Saunders et al. 2000), or that will soon become available, 
such as the redshift surveys of 2dF (Colless 1999) and SDSS (e.g., Loveday et al. 1998) and 
high-redshift surveys such as DEEP (Davis & Faber 1999). Alternative methods have been 
proposed to measure the biasing function, using the cumulant correlators of the observed 
distribution of galaxies in redshift surveys (Szapudi 1998) or their bispectrum (Matarrese, 
Verde, Heavens 1997, Verde et al. 1998). 

We first show in §|2], using halos and galaxies in A-body simulations, that the difference 
between the cumulative distribution functions (CDFs) of galaxies and mass can be straight- 
forwardly translated into (6 g \8) despite the scatter in the biasing scheme. Then, in §|3], we 
demonstrate that for our purpose, C(8) is insensitive to the cosmological model and can be 
approximated robustly by a cumulative log-normal distribution. This means that we do not 
need to observe C(8), which is hard to do; we only need to measure C g (8 g ) and, indepen- 
dently, the rms value a of the mass fluctuations on the same scale. In we slightly modify 
the method to account for redshift-space distortions, and use mock galaxy catalogs from 
N-body simulations to evaluate the associated errors. Finally, in §[5], we estimate the errors 
due to the sparse sampling and finite volume. The method and its applications to existing 
and future data are discussed in §|[ 



2. BIASING FUNCTION FROM DISTRIBUTION FUNCTIONS 

Let C g (8 g ) and C(8) be the cumulative distribution functions of the density fluctuations 
of galaxies and mass respectively (at a given smoothing window). Had the biasing relation 
been deterministic and monotonic, it could have been determined straightforwardly from the 
difference between these CDFs at given percentiles, 

6 g {S) = C- 1 [C{S)], (4) 
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where C" 1 is the inverse function of C g .[] In the presence of scatter in the biasing scheme, 
strict monotonicity is violated, but it is possible that C~ 1 [C(5)] is still a good approximation 
for (5 g \5), as long as the latter is monotonic.[] The validity of this approximation is addressed 
in the present section. 

We use two cosmological iV-body simulations in which both halos and galaxies were 
identified (Kauffmann et al. 1999). The cosmological models are rCDM (with Q m = 1 and 
h = 0.5) and ACDM (with fi m = 0.3, fi A = 0.7 and h = 0.7). N = 256 3 particles were 
simulated in a periodic box of comoving size 85 and 141 /z -1 Mpc respectively (corresponding 
to a mass resolution of 1.0 • 10 10 h' 1 M Q and 1.4 • 10 10 /i- 1 M Q ). The simulations were run 
using a parallel adaptive P 3 M code kindly made available by the Virgo Consortium (see 
Jenkins et al. 1998) as part of the "GIF" collaboration between the HU Jerusalem and the 
MPA Munich. The present epoch is defined by a linear rms density fluctuation in a top- 
hat sphere of radius 8h~ 1 Mpc of erg = 0.6 in the rCDM simulation and <jg = 0.9 in the 
ACDM simulation. Dark-matter halos were identified at densely sampled time steps using 
a friends-of-friends algorithm. Galaxies were identified inside these halos by applying in 
retrospect semi-analytic models (SAMs) of galaxy formation (Kauffmann et al. 1999). The 
SAMs simulate the important physical processes of galaxy formation such as gas cooling, 
star formation and supernovae feedback. At different times in the redshift range to 3, we 
select halos by mass and galaxies by luminosity or type. We then compute density fields 
by applying top-hat smoothing with radii in the range 5 — 15 h~ 1 Mpc. We report detailed 
results for the case of 8 h~ l Mpc smoothing, and refer to the scale dependence in several 
places. 

The figures of this section illustrate the success of the approximation, equation (Q), in 
several different cases based on the rCDM simulation, with top-hat smoothing of radius 
8 /i _1 Mpc (hereafter TH8, or THAT for radius X/i^Mpc), and at different redshifts. Fig- 
ure p] refers to halos of mass > 10 l2 h~ 1 M Q (> 100 particles). On the top we show the 
cumulative distributions of halos and underlying mass fluctuations, C g (5 g ) and C(5) (our 
notation does not distinguish between halos and galaxies). The errors in C g are computed 
from 20 bootstrap simulations of the halo field. The errors in C, estimated in the same 
way, are smaller by an order of magnitude and are therefore not shown. The bottom panels 
show a point-by-point comparison of the TH8 fields of S g (x) and 5(x) at points randomly 
chosen (1:8) from a uniform grid of spacing 2.64/i -1 Mpc within the simulation box. The 
true mean biasing function (5 g \5) is marked by the filled circles with attached error bars. It 
is computed by a local linear regression of 5 g on 5 within each bin of 5, adopting the value of 



3 A similar relation has been used by Narayanan & Weinberg (1998) for "debiasing" the galaxy density 
field for the purpose of dynamical reconstruction. 

4 The absence of spiral galaxies in the centers of rich clusters may result in a non-monotonic biasing 
function for this type of galaxies at small smoothing scales, as hinted in Blanton et al. (1999). However, 
using the simulations described in this section, Somerville et al. (2000) do not find non-monotonicity for late 
type galaxies at 8 /i _1 Mpc smoothing, as used in Figure || below. 
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Fig. 1. — CDFs and the biasing function at different redshifts for rCDM halos with M > 
!Q l2 h~ l M & and TH8 smoothing. Top panels: the matter C(S) (solid) and the halo C g (5 g ) (dashed). 
Also shown is a log-normal distribution (dotted), largely hidden behind the exact mass distribution. 
Bottom Panels: S g (x) versus 5(x) at grid points within the simulation box. The true mean biasing 
function {5 g \5) is marked by the filled circles with error bars. Shown in comparison (solid line) is 
the approximation obtained by equation (^|) from the CDFs and the corresponding la error range 
(dotted). 



the fitted line at the center of the bin (only every other bin is shown) . Shown in comparison 
(solid line) is the approximation for (S g \S) obtained by equation (f|) from the CDFs, and the 
corresponding la error range based on the bootstrap realizations (dotted lines). 

As can be seen in Figure [l], the approximation is excellent over most of the S range - 
the deviation at z=0 is within the la errors up to 8 ~ 1.4 (corresponding to ~ 97% of the 
volume). Systematic deviations show up at higher 5 values, where the scatter becomes larger 
and the mean biasing function natter, making the deviations from monotonicity larger. In 
order to quantify the quality of the approximation, we average the residuals (scaled by a g ): 

A = -rr— 2 E MS) - (5 g \5)f , (5) 

^bmsCg 5-bins 

where S g (S) is obtained via equation ([|). We exclude the poorly recovered high-density tail 
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by limiting the summation to those iV bins bins of 5 for which C(S) < 0.99 and C g (S g ) < 0.99. 
The values of A in the various cases studied, including halos and galaxies in rCDM and 
ACDM at different redshifts, are listed in Table 1. For example, for the halos shown in 
Figure at z = we obtain A = 0.08, indicating that the typical error in the approximation 
S g (S) is small compared to the actual scatter cr g in the halo density field. 

A complementary approach for quantifying the quality of the approximation is by testing 
how well it recovers the values of the moments of the biasing function, b and b. In Table 1 
we present the values of these moments for the different cases, as computed directly from 
the simulation and as approximated by S g (S) (denoted by a subscript "a"). These biasing 
parameters are computed based on 99.9% of the volume, excluding the very highest density 
peaks, where the error is large (The only exception is at z—3, where we use only 99% of the 
volume because the errors are even larger). For the halos shown in Figure |l| at z = 0, we see 
that b and b are recovered with errors of 1% and 3% respectively. 

The middle panels of Figure |l] refer to z = 1, where b ~ 2.2. The approximation of 
equation (|) holds well in this case up to 5 ~ 0.7, which corresponds to ~ 98% of the volume. 
The approximation remains good despite the large scatter (compared to the z = case) 
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Fig. 2. — Same as Figure ffl, but for bright galaxies of Mb < — 21 rather than massive halos. 
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because the steepness of the biasing function helps maintaining reasonable monotonicity. 
The goodness of the recovery of the biasing function, with A = 0.07, is similar to the z = 
case. The parameters b and b are recovered with an accuracy of ~ 5% (Table 1). The 
right panels of Figure [I] demonstrate that the approximation is valid even at z—3, where 
the biasing is extremely strong, b ~ 6.6. The recovery of the biasing function is still good, 
A = 0.20, and its moments are approximated to within ~ 2%. 

The halo biasing function in the ACDM cosmology is recovered, in general, with similar 
success, as can be seen in the top part of Table 1. Note that in this case the recovery actually 
improves at higher redshift. This reflects the fact that in ACDM the halo biasing scatter 
becomes smaller at higher redshift (see Somerville et al. 2000, Fig. 17). It results from the 
smaller shot noise due to the higher abundance of high-redshift halos in ACDM compared 
to rCDM. 

Figure |2| is analogous to Figure [I], but now for bright galaxies of Mb — 51og/i < —19.5. 
The recovery is again quantified in Table 1; it is quite similar to the case of halos. The 
typical error is A < 0.08, and the biasing parameters are recovered with an error of a couple 
to a few percent. 

The performance of our method has been tested for smoothing scales in the range 
5—15 /i _1 Mpc. For the rCDM model, we find that the quality of the approximation is 
practically independent of scale throughout this range; the relative error in the biasing 
parameters is at the level of a few percent, and A is in the range 0.1 to 0.2, rather similar 
to the values quoted in Table 1 for TH8 smoothing. On the other hand, for ACDM we do 
find that the performance improves with increasing smoothing scale. With TH15 at z — 0, 
for halos (or galaxies), the errors in the biasing parameters reduce to below 3% (1%), and 
A = 0.07 (0.04), while for TH5 smoothing these errors are about 4 times larger. This 
difference between the two models can be attributed to a difference in the scale dependence 
of the biasing scatter (Somerville et al. 2000, Figure 16), which translates to an error in our 
method via increased deviations from monotonicity. 

Before we proceed with the biasing relative to the underlying mass, we note that the 
relative biasing function of two different galaxy types, (S g2 \5 gl ), can be directly observable 
from a redshift survey. Again, for a deterministic and monotonic biasing process one has 

M^Cg/fCg^gJ] , (6) 

and when biasing scatter is present, the question is to what extent equation (|6]) provides a 
valid approximation for the true relative biasing function. 

Figure [5] shows the relative biasing function of "early" and "late" type galaxies in the 
two cosmological models, at z = and with TH8 smoothing. These galaxy types are distin- 
guished in the SAM A^-body simulations according to the ratio of bulge to total luminosity 
in the V band being larger or smaller than 0.4 respectively. The large scatter in the rela- 
tive biasing, due to errors in the two density fields, is reduced by including all the galaxies, 
without applying a luminosity cut. 
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Fig. 3. — The relative biasing of early versus late type galaxies, at z=0, for rCDM (right panels) 
and ACDM (left panels). The symbols are as in Figure [j]. 



As can be seen in the last three columns of Table 1, the quality of the recovery of the 
relative biasing function is not as good as in the case of the absolute biasing of galaxies or 
halos. The values of A range from 0.2 to 0.56, compared to 0.08 to 0.16 in the former cases. 
This is expected, because in the case of relative biasing the two density fields contribute to 
the stochasticity or deviation from monotonicity (see also the important role of sampling 
errors in the recovery of the biasing function, §4.2). The moments of the relative biasing 
function are recovered with better than 15% accuracy at z < 1, and to ~25% accuracy at 
z = 3, in both cosmologies. In calculating the moments, unlike in Figure ^, a luminosity cut 
has been applied: M B — 5 log h < —19.5, and 99% of the volume was used. The fact that the 
A values are still significantly smaller than unity and the errors in the biasing parameters are 
not larger than 25% indicate that our method is capable of yielding meaningful estimates 
of the relative biasing function. In both cosmologies, the relative biasing is almost scale 
independent in the range 5-15 /i _1 Mpc, as is the quality of the reconstruction. 



3. THE MASS CDF: ROBUST AND LOGNORMAL 

Large redshift surveys provide a rich body of data for mapping the galaxy density field 
in extended regions of space and computing its CDF with adequate accuracy. However, 
direct mapping of the mass density field is much harder. For example, POTENT recon- 
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struction from peculiar velocities (Dekel, Bertschinger & Faber 1990; Dekel et al. 1999; 
Dekel 2000) yields the mass distribution in our local cosmological neighborhood (even out 
to ~ 100 /t _1 Mpc), which in principle enables direct mapping of the local biasing field. How- 
ever, the sparse and noisy data limit the mass reconstruction to low resolution (~ 10 /i _1 Mpc) 
compared to the volume sampled, which introduces large cosmic scatter in the mass CDF. 
New accurate data nearby, based on SBF distances (Tonry et al. 1997) do enable a promising 
resolution of a few Mpc (see Dekel 2000), but limited to inside the local sphere of radius 
~ 30/i _1 Mpc. 

What makes the method proposed here feasible is the fact that the mass CDF is only 
weakly sensitive to variations in the cosmological scenario within the range of models that 
are currently considered as viable models for the formation of large-scale structure (e.g., 
Primack 1998, Bahcall et al. 1999). It has been proposed that the mass PDF can be well 
approximated by a log-normal distribution in p/p = 1 + 5 (e.g., Coles & Jones 1991; Kofman 
et al. 1994), and it has since been argued that this approximation becomes poor for certain 
power spectra and at the tails of the distribution (Bernardeau 1994; Bernardeau & Kofman 
1995). In this section, we investigate the robustness of C(S) for our purpose here, namely, 
in comparison with the typical difference between the CDFs of galaxies and mass (i.e., the 
mean biasing function) which we are trying to approximate. 

We use for this purpose a suite of iV-body simulations of six different cosmological 
models. In addition to the two high-resolution simulations of rCDM and ACDM used in the 
previous section, we have simulated three random realizations of each of the three following 
models (all using a Hubble constant of h = 0.5): standard CDM (SCDM; Q m = 1 with 
spectral index n — 1), an extreme open CDM (OCDM; Q m = 0.2, n = 1), and an extreme 
tilted CDM (TCDM; Q m — 1, n — 0.6). These simulations were run by Ganon et al. (2000, 
in preparation) using a PM code (by Bertschinger & Gelb 1991), with 128 3 particles in a 
256 /i _1 Mpc box. The present epoch is defined in these simulations by a linear fluctuation 
amplitude of <7g = 1.0. A similar simulation was run using a constrained realization (CR) of 
the local universe based on the galaxy density in the IRAS 1.2Jy redshift survey under the 
assumption of no biasing (Kolatt et al. 1996), with Q m = 1 and the present defined in this 
case by as = 0.7. 

Figure |3] (left) shows for the different models the deviations AC (5) of the mass CDFs, 
smoothed TH8, from a cumulative log-normal distribution with the same a. The log-normal 
probability density is 



1 1 

P \/27TS 



(In p — to) 



2s 2 
where 

p = 1 + 5 , to = -0.5 ln(l + a 2 ) , s 2 = ln(l + a 2 ) and a 2 = (S 2 ) 
The cumulative log-normal distribution is obtained by integration, 

Cin(o) = erf 



(7) 



(9) 
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Fig. 4. — Robustness of the mass CDF to cosmological models. Left: The deviation AC of the 
CDFs from a cumulative log-normal distribution, for various CDM cosmologies at z = 0: rCDM 
(solid); ACDM (long-dashed); OCDM (dot-dashed); TCDM (dashed); SCDM (dotted); and CR 
(dot-long-dashed). Right: The approximation 6 g (6) based on the exact C{5) (solid curve, with 
dotted lines marking 1-a errors), versus the one based on the approximation C(5) = C\ n (5) instead 
(dashed curve). They lie almost on top of each other. The true mean biasing function (5g\5) is 
shown for comparison (points with error bars). All are for halos with M > 10 12 h~ 1 MQ in the 
rCDM simulation. 



where 



erf(x) = 




For the cases of OCDM, TCDM and SCDM, the CDF is obtained from the three simulations 
of each model put together. The errors are similar in the different cases; we therefore plot 
representative error bars only for the rCDM case. 

In all the realizations that had random Gaussian initial conditions, the deviation from 
lognormality is less than 2%. The constrained realization shows somewhat larger deviations, 
but even in this case they never exceed 5%. These deviations are indeed smaller than the 
typical differences between C g (S) and C(8), which are on the order of 10% (see Figure [I]). 

In order to evaluate how important the contribution of AC is to the error in the recovery 
of {S g \5), we compare in the right panel of Figure f| the true (S S \S) in the rCDM simulation 
with two approximations S s (5) based on equation (||), one using the true matter CDF and the 
other replacing it with a cumulative log-normal distribution of the same a. The results of the 
two approximations are very similar; the differences between them seem to be much smaller 
than the differences between each of them and the true biasing function (S g \5). We can 
conclude that for the purpose of recovering the biasing function, for the range of Gaussian 
cosmological models considered, C\ n is a good approximation for C. 
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The proximity of C and C\ n could have been alternatively evaluated by the Kolmogorov- 
Smirnov (KS) statistic, D = max{|AC|}. For computing the KS significance q{D), we 
estimate the effective number of "independent" points by N eS = Vb ox /Kvin, where Vb ox is the 
volume of the simulation box and Kvm is the effective volume of the smoothing window. A 
value of q ~ 1 (D <C 1) corresponds to a good match, and q <C 1 (D ~ 1) to a poor match. 
For our rCDM simulation, with TH8 smoothing at z = and 1, we obtain D ~ 0.01 and 
q > 0.9999, confirming that C\ a is a good fit. However, for the larger SCDM and OCDM 
simulations, although D is still only ~ 0.015, the corresponding q values are at the level of 
only a few percent. For TCDM and CR, where D is 0.016 and 0.052 respectively, the values 
of q drop to the level of a fraction of a percent, and the discrepancy seems large. This KS 
test indicates that the log-normal approximation is not always perfect for general purpose, 
as has been argued in the literature. However, our direct tests reported above demonstrate 
that the use of the log-normal approximation is adequate for the recovery of the mean biasing 
function in all these cases. 

We comment in passing that while the mass CDF is well approximated for our purpose 
by a log-normal distribution, the shape of the halo (or galaxy) CDF is usually far from a log- 
normal shape. This is implied by equation (f|), from which it follows that C g (5 g ) = C[5 g 1 (S g )]. 
One does not expect to recover a log-normal distribution from a general functional form for 
5 g 1 . In particular, the linear biasing model, which seems to be an acceptable approxima- 
tion in some cases with large smoothing (e.g., IRAS 1.2Jy galaxies at 12 h~ 1 Mpc Gaussian 
smoothing; Sigad et al. 1998), leads to a C g (5 g ) that is far from log- normal. Trying to evalu- 
ate the difference between C g and a log-normal distribution using the KS statistic, we obtain 
for the halos in the rCDM simulation, with TH8 smoothing, both at z = and 1, D ~ 0.08 
and q ~ 0.05, namely a poor fit compared to the q ~ 1 of C vs C\ n . Similar conclusions are 
valid for galaxies. 

Our method for measuring the nonlinear biasing function requires an assumed value of 
a. Since a is known only to a limited accuracy (§@), we should check the robustness of our 
results to errors in a. We repeated the reconstruction described in §0, both for halos and for 
galaxies, with perturbed values of a in a range ±20% about the true value of the simulation. 
Not surprisingly, we find that the analog of the linear biasing parameter, b, varies roughly in 
proportion to a" 1 . We also find that b varies in a similar way, such that the ratio b/b, which 
is the natural measure of nonlinear biasing (Dekel & Lahav 1999), is a very weak function of 
a, roughly b/b oc a 015 . This test indicates that our method provides a robust measure of the 
nonlinearity in the biasing scheme, that is to a large extent decoupled from the uncertainty 
in the linear biasing parameter. 



4. REDSHIFT DISTORTIONS 

The densities as measured in redshift space (z-space) are in general different from the 
real-space (r-space) densities addressed so far, because the radial peculiar velocities distort 
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Fig. 5. — Biasing functions in z-space (dashed) versus r-space (solid). The biasing functions are 
derived from the corresponding TH8 CDFs of halos and mass in the tCDM simulation at z = 0. 
Shown are halos of M > 1O 12 /» _1 M (left) and M > 5 • 10 12 /i -1 M Q (right). 

the volume elements along the lines of sight. One approach to deal with redshift distortions 
is to start by recovering the full galaxy density field in r-space, using the linear or a mildly- 
nonlinear approximation to gravitational instability (e.g., Yahil et al. 1991; Strauss et al. 
1992; Fisher et al. 1995; Sigad et al. 1998), and then compute the biasing function in r-space 
as outlined above. The accuracy of such a procedure would be limited by the approximation 
used for nonlinear gravity. Another difficulty with this approach is that it requires one to 
assume a priori a specific biasing scheme, already in the force calculation that enters the 
transformation from z-space to r-space, while this biasing scheme is the very unknown we 
are after; this would require a nontrivial iterative procedure. 

The alternative we propose here is to actually use the z-space CDF, C giZ (<5 gjZ ), as provided 
directly from counts in cells of galaxies in a redshift survey. If the redshift distortions affect 
the densities of galaxies and mass in a similar way, then one may expect the biasing function 
in z-space to be similar to the one in real space, 

(5 g , z \5 z = 5) = (5 g \5) . (11) 

If we only had a robust functional form for the mass CDF in z-space, C Z (S Z ), then we could 
compute the desired biasing function all in z-space, using equation @ but with the analogous 
z-space quantities. We thus need to test the validity of equation (p]) , and come up with a 
useful approximation for C Z (S Z ). 

Figure || illustrates the accuracy of equation ([11]). It compares the biasing functions in 
z-space and r-space, as derived via equation (f|) and its z-space analog from the corresponding 
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CDFs of halos and mass in the rCDM simulation with TH8 smoothing. The two curves are 
remarkably similar for S < 0.6 — 0.8, roughly out to the 1-sigma rms fluctuation value. This 
is roughly the range where the biasing scatter is reasonably small and our basic method is 
applicable (§0, Figure |l|). The curves deviate gradually as 5 increases, partly due to stronger 
"fingers of god" effects at high densities. The deviation is somewhat weaker for larger-mass 
halos (perhaps due to a lower velocity dispersion for more massive objects as a result of 
dynamical friction). 

The direction of the deviation from equation (|TT|) , as seen in Figure can be obtained 
by applying linear theory of redshift distortions to the case of linear biasing in r-space, 
5 g = b5. In linear theory, the density fluctuations in r-space and z-space are related via 
5 Z = 5[1 + /(f2 m )/i 2 ], where f(fl m ) — fi^ 6 (with a negligible dependence on see Lahav et 
al. 1991) and is the cosine of the angle between the galaxy velocity vector and the line of 
sight. If the galaxies obey the continuity equation, then <5 giZ — 5 g = 5 Z — 5, which implies the 
following biasing relation in z-space: 

b + /(n,,,)^ 2 

^=i + /(iW^ 4 - (12) 

Averaging over all possible directions and assuming Q m = 1, we find that the linear biasing 
parameter in z-space is predicted to be b z = (3b + 2)/5 for the case shown in Figure [|. This 
indicates that the linear biasing parameter tends to be closer to unity in z-space than in 
r-space. Based on our empirical tests of equation (|TT|) , we learn that the nonlinear effects (of 
biasing and gravity) conspire to make equation (|TT| ) a better approximation than implied by 
the linear approximation. 

Note that while the results of Figure |5| based on our high-resolution rCDM simulation 
are quite accurate in the way they treat halos, they may suffer from significant cosmic 
variance due to the relatively small volume sampled, where the presence (or absence) of a 
few "fingers of god" could strongly affect the biasing function in the high-<5 regime. To test 
the validity of equation ( |TTD with reduced cosmic variance, we appeal to yet another set of 
N-body simulations (by Cole et al. 1997) which cover a much larger volume but with lower 
resolution. These simulations followed the evolution of N = 192 3 particles in a periodic box 
of comoving side L = 345.6 h~ l Mpc using an Adaptive P 3 M code. The cosmological models 
are ACDM (Q m = 0.3, fi A = 0.7, h = 0.65, cluster-normalized to a 8 = 1.05) and rCDM 
(Q = 1, h = 0.25, cluster-normalized to as = 0.55). Nine mock catalogs were extracted from 
each of the parent simulations, each containing ~ 5-10 5 particles in a box of L = 200 h~ 1 Mpc. 
The partial overlap between the catalog volumes is thus about 50%. The central "observer" 
was chosen to mimic certain properties of the Local Group environment (see Branchini et 
al. 1999). Since the resolution of these large simulations is inadequate for a detailed halo 
identification based on many simulated particles in each halo, we identify individual particles 
as galaxies using a Monte-Carlo procedure in which the galaxies are chosen to make a random 
realization of an assumed nonlinear biasing function. Here we adopt the biasing function 
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Fig. 6. — CDFs and biasing functions in r-space versus z-space, averaged over mock catalogs that 
were extracted from the large ACDM simulation, with TH8 smoothing. Top: CDFs in r-space (left) 
and z-space (right), for mass (solid) and for galaxies (dashed). Shown in comparison is Ci n>z with the 
a z of the matter (dotted). Bottom left: the biasing functions as derived from the CDFs in r-space 
(long-dash) and z-space (short-dash); they are very similar. Also shown is the biasing function 
derived in z-space assuming C\ n>z with a z obtained using equation (|l5|) (dotted line). Bottom right: 
absolute value of the difference between the biasing functions: <5 g , z (<5 z = S) — 6 S (5) (dashed) and 
<W,z(4 = <5)-<5 g (<5) (dotted). 



proposed by Dekel & Lahav (1999) to fit the simulated results of Somerville et al. (2000): 

6(S) _ f (1 + M(1+ 5)^-1 5<0) 

with 6 ncg = 2 and 6 pos = 1. The mass density field is obtained with a Gaussian smoothing of 
radius 5 /i _1 Mpc at the points of a 128 3 cubic grid inside a box of size 200 /i _1 Mpc. Galaxy 



densities are obtained at the grid points based on equation (13), and then interpolated to 
the galaxy positions as defined by the selected particles. Given the appropriate probability 
distributions P{5), the value of bo is determined for each choice of the parameters 6 neg and 
6p OS such that (<5 g ) = as required by definition. We obtain b = 0.26 and b = 0.19 for the 
models of ACDM and rCDM respectively. 
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Figure |6] compares the CDFs and associated biasing functions in r-space and z-space, 
averaged over nine mock catalog from the large-box ACDM simulation. The z-space bias- 
ing function is indeed almost indistinguishable from the r-space one (bottom panels); the 
differences are typically on the order of a couple of percents. The results for rCDM are 
similar. 

In order to quantify this difference further, we define a statistic analogous to equa- 
tion (0): 

A = J7^-2 E [^ z = 5)-6 g (6)}\ (14) 

bins g ,5 _bins 

in which the first and second terms are the biasing functions as derived from the CDFs in 
z-space and r-space respectively. The summation is over bins with 5 < <5 max , such that ~ 99% 
of the volume is accounted for. We also compute the two moments of the observed biasing 
function 6 obs and 6 b s - These three quantities, averaged over the mock catalogs, are listed 
in Table 2 (second column). Their deviation from the "true" values (Table 2, first column) 
is the systematic error. The quoted errors refer to the la scatter about the mean; they 
represent the random errors. The results are listed for the two models, ACDM and rCDM. 
We conclude that the biasing function and its moments, as computed from the z-space CDFs, 
resemble those computed from the r-space CDFs to within 2%. Note that the Monte Carlo 
procedure we use to generate mock catalogs artificially reduces the amount of clustering 
and over-smoothes the density fields for dark and luminous particles. The net effect is to 
decrease the biasing moments by ~ 7%, relative to the values implied by the biasing scheme, 
equation (0). However, this bias does not affect the present analysis for which "true" values 
are obtained from the mock catalogs themselves and not from equation (pL3|) . 



Our next task is to come up with a robust CDF for the mass in z-space. We try the 
same log-normal distribution that was found robust for our purpose in r-space (§0), but with 
a proper rms in z-space, a z . Based on the linear approximation for Gaussian fields in the 
small-angle limit (Kaiser 1987), we express <r z in terms of a and Q m of the cosmological 
model by: 



I mm) + \f{Sl 



2 
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We thus approximate the z-space biasing function by 5i n)Z (5 z ), as derived from the z-space 
CDFs but where the mass CDF is replaced by a cumulative log-normal distribution function 
Cin, z (eq. H) with standard deviation cr z (eq. fL5|). The resultant biasing function, averaged 
over the mock catalogs, is displayed in the bottom panels of Figure ||. We see that for 
ACDM the differences between 5i n z (5 z = 5) and 5 g (8) are at at the level of a few percent. For 
rCDM they are only a bit larger; they exceed 10% but only near 5 ~ 2, at the tail of the 



distribution. The error in the biasing function A defined in analogy to equation (14), and the 
biasing moments, are listed in Table 2 (third column, marked "z-space In"). The systematic 
error A is still well below 2%, but the biasing parameters are systematically underestimated 
by 4% and 7% in ACDM and rCDM respectively. 



-16- 



Overall, it seems that our straightforward method deals with redshift distortions fairly 
well, without any a priori assumption about the biasing scheme. 



5. SAMPLING ERRORS 

The accuracy of the derivation of the galaxy PDF is limited by two observational factors: 
the finite volume sampled and the mean density of galaxies in the sample.^ 

In principle, the limited volume is responsible for cosmic variance due to the fact that 
the long-wavelength fluctuations in the real universe are not fairly represented in the sampled 
volume. This is not of major concern for us here because (a) it is expected to introduce only 
a random error, and (b) as long as the biasing is local, the effects of long waves on the 
PDFs of galaxies and mass are expected to be correlated, making the local biasing function 
representative of the universal function despite the relatively small sampling volume. 

More important is the shot noise introduced by the combination of volume and sampling 
density effects. For a given cell size (or smoothing length), the error can be divided into the 
error in the count within each cell and the error due to the finite number of cells in the sample 
volume. These shot-noise sources may introduce both random and systematic errors. We 
evaluate them by computing the mean and standard deviation over a suite of mock catalogs 
in which we vary either the volume or the sampling density for a fixed smoothing scale. 

With TH8 smoothing, our mock catalogs from the large ACDM simulation contain 
N e g ~ 3700 independent cells. However, the currently available redshift surveys allow an 
analysis in a much smaller volume. For example, a volume- limited subsample from the PSCz 
catalog (Saunders et al. 2000), that is cut at a distance where the average galaxy separation 
is I = 8/i -1 Mpc (i.e., on the order of our smoothing scale), contains only ~ 600 independent 
cells. We therefore estimate the error associated with reducing the sampled volume such 
that iV e ff ~ 600 in each mock catalog. We select from the simulation 9 such non-overlapping 
sub-volumes, while keeping the sampling density and smoothing length fixed. The results 
for ACDM, averaged over the mock catalogs, are shown in the upper panels of Figure 0, 
and the results for the two cosmological models are summarized in Table 2 (column 4). We 
find no significant systematic errors due to the volume effect in a sample like PSCz and 
with ~ 8/i _1 Mpc smoothing (except in the very high-5 tail for rCDM). The corresponding 
random errors in the biasing parameters are 5% and 6% for ACDM and rCDM respectively. 

The sampling density can be parameterized by the mean galaxy separation, I. In our 
large simulation I = 2.5 /i -1 Mpc, much smaller than the smoothing length of 8 /i _1 Mpc, 
but in real samples / could be on the order of the smoothing length. To test the effect of 
sampling density, we produce 9 mock catalogs in which galaxies are sub-sampled at random 



5 The additional edge effects can be greatly minimized by using a volume-limited sample and a proper 
choice of cell coverage (see Szapudi & Colombi f996). 
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from the original catalog such that the mean separation is I = 6, 8, or lO/i Mpc, while the 
smoothing length and large volume are kept fixed with N e g ~ 3700. The results for ACDM 
are shown in the bottom panels of Figure |7|, and for the two models in Table 2 (columns 
5-7). We see that the sparse sampling artificially enhances both positive and negative density 
fluctuations, which enlarges the width of the galaxy PDF. This results in a steeper biasing 
function. For ACDM, the effect becomes noticeable only when I > 8 /i _1 Mpc, where the 
systematic error in the biasing parameters is of order 10% and larger, and A is of order a 
few percent. For rCDM the sampling-density effect is noticeable already for / ~ 6 h~ l Mpc, 
with the error reaching 30 — 50% at I ~ 10 /i~ 1 Mpc. A plausible explanation for why the 
sparse sampling is more damaging in the rCDM model is that the clustering in this model is 
weaker (cr 8 is smaller to match the cluster abundance which constrains Ogfi ' 5 ), and therefore 
the high-density regions are poorly sampled by galaxies. 

In summary: the main source of error in our analysis is the sparse sampling. For 




-1 0123 -1 0123 -1 0123 

6 6 6 

Fig. 7. — Sampling errors due to finite volume (top) and sparse sampling (bottom), for fixed TH8 
smoothing, estimated from the large ACDM simulation. Shown are the CDFs in real space (left), 
the derived biasing function (middle), and the error in it (right). The mass CDF is marked by a 
solid line, and galaxy CDFs by broken lines. Top: volumes of N e s = 3700 and 600 are marked by 
long-dashed and dotted lines respectively. Bottom: samples of galaxy separation I = 2.5,8, and 
10/i Mpe are marked by long-dashed, short-dashed, and dotted lines respectively. 
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recovering the biasing function with TH8 smoothing, the mean separation should be < 
S/T^Mpc. 



6. CONCLUSION 

We propose a simple prescription for recovering the mean nonlinear biasing function 
from a large redshift survey. The biasing function is defined by b(5) 5 = (5 S \S), and is 
characterized to second order by two parameters, b and b, measuring the mean biasing and 
its nonlinearity respectively. The method is applied at a given cosmology, time, object type 
and smoothing scale, and involves one parameter that should be assumed a priori — the rms 
mass density fluctuation a on the relevant scale. 

The main steps of the algorithm are as follows: 

1. Obtain the observed cumulative distribution function in redshift space C g!Z (£ g>z ), by 
counts in cells or with window smoothing at a certain smoothing length. 

2. Assume a value for o on that scale and for the cosmo logical density parameter Q m , and 
approximate the mass CDF in redshift space by Ci njZ (5 z ; cx z ), the cumulative log-normal 
distribution (eq. 0), with the width <r z derived from a and fl m by equation ([15]). 

3. Derive the mean biasing function by 

5 g (5 = 5 z ) ~ 5 g , z (5 z ) = C£[C^(5 X ; a z )} . (16) 

We first showed that the mean biasing function, at TH8 smoothing, can be derived 
with reasonable accuracy from the r-space CDFs of galaxies (or halos) and mass, despite the 
biasing scatter. We then demonstrated that for a wide range of CDM cosmologies the mass 
CDF can be properly approximated for this purpose by a log-normal distribution of the same 
width a. Next we showed that the biasing functions in z-space and r-space are very similar, 
and that the z-space mass CDF can also be approximated by a log-normal distribution, with 
a width derived from a via equation flT5p. This allows us to apply the method directly to 
the observed CDF in a redshift survey. The errors in the recovered biasing function and 
its moments, in an ideal case of dense sampling in a large volume, are at the level of a few 
percent. 

In any realistic galaxy survey the limited volume and discrete sampling introduce further 
random and systematic errors. For a survey like the PSOs survey, the main source of error is 
the sampling density; the error does not exceed ~ 10% as long as the mean observed galaxy 
separation is kept smaller than the smoothing radius. We are currently in the process of 
applying this method to the PSCz survey (E. Branchini, et al. 2000, in preparation), where 
a more specific error analysis will be carried out. The sampling errors are expected to be 
significantly smaller for the upcoming 2dF and SDSS redshift surveys. 
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In §0 we showed that our method works well both for halos and for galaxies, on scales 
5 to 15/i -1 Mpc, and in the redshift range < z < 3 over which the biasing is expected to 
change drastically. We obtain a similar accuracy when we vary the cosmological model, the 
mass of the halos in the comparison, or galaxy properties such as morphological type and 
luminosity. The approximation 5 g (5) is consistent (the deviation is less than l-a) with the 
true average biasing function (5 g \5) over a wide range of 6 values, which covers 98 - 99% of 
the volume, depending on redshift and the type of biased objects. This allows us to estimate 
the moments of the biasing function to within a few percent (see Table 1). The moments 
of the biasing function are derived from 99.9% of the volume (99% at z=3 and for relative 
biasing). 

The method requires as external parameters the rms mass-density fluctuation a and 
the cosmological parameter Q m . These can be obtained by joint analyses of constraints 
from several observational data sets, such as the cluster abundance (e.g., Eke et al. 1998), 
peculiar velocities (e.g., Dekel & Rees 1994; Zaroubi et al. 1997; Freudling et al. 1999), 
CMB anisotropies (e.g., de Bernardis et al. 1999), and type la supernovae (Riess et al. 1998; 
Perlmutter et al. 1999). Examples for such joint analyses are Bahcall et al. (1999) and Bridle 
et al. (1999). 

The method is clearly applicable at z ~ with available redshift surveys and especially 
with those that will become available in the near future, 2dF and SDSS. In the future, this 
method may become applicable at higher redshifts as well, where the biasing plays an even 
more important role. With the accumulation of Lyman-break galaxies at z ~ 3, it may soon 
become feasible to reconstruct their PDF by counts in cells, and our method will allow a 
recovery of the biasing function at this early epoch, with consequences on galaxy formation 
and on the evolution of structure. 

We have concentrated here on smoothing scales relevant to galaxy biasing, but the 
method may also be applicable for the biasing of galaxy clusters, on scales of a few tens of 
Mpc. The biasing scatter may be larger for clusters because of their sparse sampling, but 
the larger mean biasing parameter for clusters may help in regaining the required mono- 
tonicity for equation (Q) to provide a valid approximation to the mean biasing function. 
The mass PDF has been checked to be properly approximated by a log-normal distribution 
at smoothing scales in the range 20 to 40 /i~ 1 Mpc, using simulations of the standard CDM 
and Cold+Hot DM models (Borgani et al. 1995). The errors due to sparse sampling would 
require a smoothing scale at the high end of this range. 

In a large redshift survey which distinguishes between object types, one can measure 
the relative biasing function between two object types by applying equation (|) in redshift 
space, using the observed CDFs for the two types without appealing to the underlying 
mass distribution at all. The upcoming large redshift surveys 2dF and SDSS, and the 
DEEP survey at z ~ 1, are indeed expected to provide adequate samples of different galaxy 
types. Compared with the predictions of simulations and semi-analytic modeling of galaxy 
formation (e.g., Kauffmann et al. 1999; Benson et al. 1998; Baugh et al. 1999; Somerville & 
Primack 1999), the measured relative biasing function can provide valuable constraints on 
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the formation of galaxies and the evolution of structure. 

While implementing the method outlined above for measuring the mean nonlinear bi- 
asing function using current and future redshift surveys, the next challenge is to devise a 
practical method for measuring the biasing scatter about the mean. 
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lating discussions, and V. Narayanan and M. Strauss for a helpful referee report. EB thanks 
the Hebrew University for its hospitality. This work was supported by the Israel Science 
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Table 1: Recovery of the biasing function from the CDFs 
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vs. late type 




z=0 


z=l 


z=3 


z=0 


z=l 


z=3 


z=0 


z=l 


z=3 


b 


0.90 


2.18 


6.62 


0.93 


1.71 


AAA 


1.17 


1.34 


1.27 


k 


0.89 


2.28 


6.75 


0.93 


1.75 


4.32 


1.18 


1.39 


1.50 


b 


0.93 


2.20 


7.85 


0.95 


1.71 


4.62 


1.18 


1.35 


1.31 


k 


0.96 


2.30 


8.00 


0.98 


1.76 


4.63 


1.26 


1.46 


1.65 


A 


0.08 


0.07 


0.20 


0.08 


0.04 


0.08 


0.22 


0.20 


0.54 



Table 2: Redshift distortions and sampling errors in the biasing function 



True 
b 1.13 
b 1.14 
A 



z-space 
1.12 ±0.006 
1.12 ±0.006 
0.001 ±0.001 



z-space In 

1.09 ±0.02 

1.10 ±0.02 
0.002 ±0.001 



ACDM 
Volume 

1.12 ±0.05 

1.13 ±0.05 
0.005 ±0.006 



I = 6 a 
1.17 ±0.05 
1.17 ±0.05 
0.006 ±0.006 



/ = 8 a 

1.23 ±0.04 

1.24 ±0.04 
0.016 ±0.010 



/ = 10 a 

1.31 ±0.06 

1.32 ±0.06 
0.049 ± 0.028 



True 
b 1.188 
b 1.192 
A 



z-space 
1.18 ±0.002 
1.18 ±0.002 
0.002 ±0.0003 



z-space In 
1.11 ±0.02 
1.11 ±0.02 
0.016 ±0.011 



rCDM 
Volume 
1.21 ±0.06 
1.21 ±0.06 
0.072 ± 0.063 



/ = 6 a 

1.35 ±0.07 

1.36 ±0.07 
0.177 ±0.178 



I = 8 a 
1.55 ±0.07 
1.55 ±0.07 
0.563 ±0.368 



/ = 10 a 

1.80 ±0.07 

1.81 ±0.07 
1.505 ±0.564 



a in units of h 1 Mpc 



